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PROCESSOR AND METHOD OF EXECUTING A LOAD INSTRUCTION THAT 
BIFURCATE LOAD EXECUTION INTO TWO OPERATIONS 

BACKGROUND OF THE INVENTION 

1. Technical Field: 

The present invention relates in general to 
data processing and 7 in particular, to a processor and 
method of performing load operations in a processor. 
Still more particularly, the present invention relates to 
a processor and method of processing a load instruction 
that bifurcate load execution into two separate 
operations . 

2. Description of the Related Art: 

Most processors' instruction set architectures 
(ISAs) include a load or similar type of instruction 
that, when executed, causes the processor to load 
specified data from memory (e.g., cache memory or system 
memory) into the processor's internal registers. 
Conventional processors handle the execution of load 
instructions in one of two ways. First, a processor may 
execute load instructions strictly in program order. In 
general, the execution of load instructions with strict 
adherence to program order is viewed as disadvantageous 
given the fact that at least some percentage of data 
specified by load instructions will not be present in the 
processor's cache. In such cases, the processor must 
stall the execution of the instructions following the 
load until the data specified by the load is retrieved 
from memory. 

Alternatively, a processor may permit load 
instructions to execute out-of-order with respect to the 
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programmed sequence of instructions. In general, out-of- 
order execution of load instructions is viewed as 
advantageous since operands required for execution are 
obtained from memory as soon as possible, thereby 
improving overall processor throughput. However, 
supporting out-of-order execution of load instructions 
entails additional complexity in the processor's 
architecture since, to guarantee correctness, the 
processor must be able to detect and cancel an out-of- 
order load instruction that loads data from a memory 
location targeted by a later-executed store instruction 
(executed in the same or a remote processor) preceding 
the load instruction in program order. 
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SUMMARY OF THE INVENTION 

The present invention addresses the poor 
performance associated with in-order processors and 
eliminates much of the complexity associated with out-of- 
order machines by providing an improved processor and 
method of executing load instructions. 

In accordance with the present invention, a 
processor implementing an improved method for executing 
load instructions includes execution circuitry, a 
plurality of registers, and instruction processing 
circuitry. The instruction processing circuitry fetches 
a load instruction and a preceding instruction that 
precedes the load instruction in program order, and in 
response to detecting the load instruction, translates 
the load instruction into separately executable prefetch 
and register operations. The execution circuitry 
performs at least the prefetch operation out-of-order 
with respect to the preceding instruction to prefetch 
data into the processor and subsequently separately 
executes the register operation to place the data into a 
register specified by the load instruction. In an 
embodiment in which the processor is an in-order machine, 
the register operation is performed in-order with respect 
to the preceding instruction. 

All objects, features, and advantages of the 
present invention will become apparent in the following 
detailed written description. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of 
the invention are set forth in the appended claims. The 
invention itself however, as well as a preferred mode of 
use, further objects and advantages thereof, will best be 
understood by reference to the following detailed 
description of an illustrative embodiment when read in 
conjunction with the accompanying drawings, wherein: 

Figure 1 depicts an illustrative embodiment of 

a data processing system with which the method and system 
of the present invention may advantageously be utilized; 

Figure 2A and 2B illustrate two alternative 
embodiments of the translation of UISA load instructions 
into separately executable PREFETCH and REGISTER 
operations in accordance with the present invention; and 

Figure 3 is an exemplary load data queue that 

may be utilized to temporarily buffer load data in 
accordance with the present invention. 
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DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENT 

With reference now to the figures and in 
particular with reference to Figure 1, there is 
illustrated a block diagram of an exemplary embodiment of 
a data processing system with which the present invention 
may advantageously be utilized. As shown, the data 
processing system includes at least one processor, 
indicated generally at 10, which, as discussed further 
below, includes various execution units, registers, 
buffers, memories, and other functional units that are 
all formed within a single integrated circuit. Processor 
10 is coupled by a bus interface unit (BIU) 14 to a bus 
12 and other components of the data processing system, 
such as system memory 8 or a second processor 10 (not 
illustrated) . 

Processor 10 includes an on-chip multi-level 
cache hierarchy 16 that provides low latency access to 
cache lines of instructions and data that correspond to 
memory locations in system memory 8. In the depicted 
embodiment, cache hierarchy 16 includes separate level 
one (LI) instruction and data caches 13 and 15 and a 
unified level two (L2) cache 17. An instruction 
sequencing unit (ISU) 20 requests instructions from cache 
hierarchy 16 by supplying effective addresses (EAs) of 
cache lines of instructions. In response to receipt of 
an instruction request, cache hierarchy 16 translates the 
provided EA into a real address and outputs the specified 
cache line of instructions to instruction translation 
unit 18. Instruction translation unit 18 then translates 
each cache line of instructions from a user instruction 
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set architecture (UISA) into a possibly different number 
of internal ISA (USA) instructions that are directly 
executable by the execution units of processor 10. The 
instruction translation may be performed, for example, by 
reference to microcode stored in a read-only memory (ROM) 
template. In at least some embodiments, the UISA- to- USA 
translation results in a different number of USA 
instructions than UISA instructions and/or USA 
instructions of different lengths than corresponding UISA 
instructions . 

Following instruction translation by ITU 18, 
ISU 20 temporarily buffers the USA instructions until 
the instructions can be dispatched to one of the 
execution units of processor 10 for execution. In the 
illustrated embodiment, the execution units of processor 
10 include integer units (IUs) 24 for executing integer 
instructions, a load-store unit (LSU) 26 for executing 
load and store instructions, and a floating-point unit 
(FPU) 28 for executing floating-point instructions. Each 
of execution units 24-28 is preferably implemented as an 
execution pipeline having a number of pipeline stages. 

During execution within one of execution units 
24-28, an instruction receives operands (if any) from, 
and stores data results (if any) to one or more registers 
within a register file coupled to the execution unit. 
For example, IUs 24 execute integer arithmetic and logic 
instructions by reference to general -purpose register 
(GPR) file 32, and FPU 28 executes floating-point 
arithmetic and logic instructions by reference to 
floating-point register (FPR) file 34. LSU 26 executes 
load and store instructions to transfer data between 
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memory (e.g., cache hierarchy 16) and either of GPR file 
32 and FPR file 34. After an execution unit finishes 
execution of an instruction, the execution unit notifies 
instruction sequencing unit 20, which schedules 
completion of the instruction in program order. Upon 
completion of an instruction, the data results, if any, 
of the instruction form a portion of the architected 
state of processor 10, and execution resources allocated 
to the instruction are made available for use in the 
execution of a subsequent instruction. 

As noted above, much of the hardware and data 
flow complexity involved in processing load instructions 
in conventional processors is attributable to the 
execution of load and other instructions out -of -program 
order. In particular, the design philosophy of many 
conventional processors that permit out-of-order 
execution of instructions is to execute load instructions 
as early as possible to place specified data into a 
register file so that subsequent instructions having a 
dependency upon the load data are less likely to stall 
due to memory access latency. The processor must then 
detect data hazards (e.g., store instructions targeting 
the same address that are earlier in program order, but 
later in execution order) with respect to the data and 
discard the load data from the register file (and 
instructions executed utilizing that load data) in the 
event that the load data is found to be stale. 

In accordance with the present invention, 
processor 10 simplifies the processing of UISA load 
instructions by translating at least some of these UISA 
load instructions into two separately executable USA 
instructions. These two USA instructions are defined 
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herein as a PREFETCH instruction that, if necessary, 
causes specified data to be prefetched from lower level 
memory (e.g., L2 cache 17 or system memory 8) into LI data 

cache 15 and a REGISTER instruction that transfers data 
specified by the UISA load instruction into a register 
file. 

Referring now to Figures 2A and 2B, there are 
depicted two alternative embodiments of the translation 
of UISA load instructions into separately executable 
PREFETCH and REGISTER instructions in accordance with the 
present invention. As illustrated in Figure 2A, in a 
first embodiment, ITU 18 translates UISA load instruction 
into two USA LOAD instructions 40 and 42 that are 
identical except for the value of a register operation 
field 50. Thus, while LOAD instructions 40 and 42 have 
matching opcode, register, and address fields 44, 46 and 
48, register field 50 of LOAD instruction 40 is reset to 
0 to indicate a PREFETCH operation, and register field 50 
of LOAD instruction 42 is set to 1 to indicate a REGISTER 
operation. A variation on this embodiment that could be 
implemented with or without instruction translation by 
ITU 18 would be for a single LOAD instruction to be 
supplied to ISU 20, and for ISU 20 to issue the LOAD 

instruction twice for execution (e.g., from an 
instruction buffer) with differing settings of register 
field 50. 

Alternatively, as shown in Figure 2B, ITU 18 
may translate a UISA load instruction into distinct USA 
prefetch and register instructions 60 and 62, 
respectively. As illustrated, USA PREFETCH instruction 
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60 contains, in addition to an opcode field 64, at least 

a target address field 66 identifying operands that may 

be utilized to compute the memory address (es) from which 
load data is to be retrieved. USA REGISTER instruction 
62, by contrast, has a different opcode specified in its 

opcode field 64 and specifies in a register field 68 the 

register (s) into which the load data are to be 
transferred. 

By translating UISA instructions to USA 
instructions in this manner, memory access latency 
associated with load instructions can be masked as in 
complex out-of-order machines, even in processors of 
reduced complexity that execute instructions either in- 
order or only slightly out-of-order. As an example, an 
exemplary cache line of instructions fetched from cache 
hierarchy 16 may include the following UISA instructions 

ADD1 
SUB1 
MUL1 
MUL2 
ST 

SUB2 

LD 

ADD2 

where ADD1 is the earliest UISA instruction in program 
order, LD is a UISA load instruction, and ADD 2 is the 
latest instruction in program order and is an addition 
instruction dependent upon the load data. According to 
the embodiment depicted in Figure 2B, these UISA 
instructions may be translated into the following 
sequence of USA instructions: 

ADD1 
SUB1 
MUL1 
MUL2 
ST 
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SUB2 
PRE 
REG 
ADD2 

where PRE and REG denote separately executable USA 
PREFETCH and REGISTER instructions, respectively. 

If instruction sequencing unit 2 0 enforces in- 
order execution, which is defined to mean that no 
instruction that changes the state of an architected 
register can be executed prior to an instruction 
preceding it in program order, processor 10 can still 
enjoy the chief benefits of executing load instructions 
out-of-order, that is, masking memory access latency, 
without the concomitant complexity by speculatively 
executing the USA PREFETCH instruction prior to at least 
one instruction preceding it in program order. In this 
manner, cache hierarchy 16 can speculatively initiate 

prefetching of the load data into LI data cache 15 to 
mask data access latency, while the REGISTER instruction 
(which alters the architected state of processor 10) is 
still performed in-order. Table I summarizes an 
exemplary execution scenario, given the USA instruction 
stream discussed above and an embodiment of processor 10 
in which ISU 20 is capable of dispatching and retiring 
two instructions per cycle. 
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In the exemplary scenario depicted in Table I, 
CI at the beginning of cycle 1, ISU 20 holds all nine of the 

jf USA instructions, for example, in a deep instruction 

nl buffer that is preferably more than one cache line of 

5; instructions in depth. In response to detecting a PRE 

instruction available for dispatch in the instruction 
buffer, ISU 20 dispatches the PRE instruction out-of- 
20 order to LSU 26, for example, concurrent with the 

dispatch of ADD1 to IU 24. 

During cycle 2, ISU 20 also decodes and 
dispatches the SUB1 and MUL1 instructions to IUs 24. 
25 Meanwhile, IU 24 executes ADD1 , and LSU 26 executes the 

PRE instruction to calculate a speculative effective 
address (EA) of the data to be loaded. This speculative 
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EA is then translated to a real address, for example, by 
reference to a conventional data translation lookaside 
buffer (TLB) , and supplied to cache hierarchy 16 as a 
prefetch request. Thus 7 if the real address hits in LI 
data cache 15, then no further action is taken. However, 
if the real address misses in LI data cache 15, then the 
real address will be furnished to L2 cache 17 as a 
request address. In the event of a hit in L2 cache 17, 
L2 cache 17 will load the associated data into LI data 
cache 15; however, if the real address misses in L2 cache 
17, then a request containing the real address will be 
sourced onto data bus 12 for servicing by system memory 
18 or another processor 10. Thus, execution of the 
PREFETCH instruction triggers prefetching of data into 
cache hierarchy 16 (and preferably LI data cache 15) that 
is likely to be loaded into a register file in response 
to execution of a REGISTER instruction. This prefetching 
is speculative, however, in that an intervening branch 
instruction may redirect the execution path, resulting in 
the REGISTER instruction not being executed. In 
addition, the contents of the registers utilized to 
compute the EA of the load data may be updated by an 
instruction executed between the PRE instruction and the 
associated REG instruction. However, because the PRE 
instruction merely affects the cache contents rather than 
the architected state of processor 10, no corrective 
action need be taken in the event of mis -speculation. 

Next, in cycle 3, ISU 2 0 completes the ADD1 
instruction, and its result data become part of the 
architected state of processor 10 . As further shown in 
Table I, the SUB1 and MUL1 instructions are executed by 
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IUs 24 , and the MUL2 and ST instructions are decoded and 
dispatched to IU 24 and LSU 26, respectively. 

Assuming that the prefetch request missed in LI 
data cache 15 and hit in L2 data cache 17 , during cycle 4 
a copy of the prefetch data is loaded from L2 data cache 
17 into LI data cache 15. The MUL2 and ST instructions 
are also executed by an IU 24 and LSU 26, respectively. 
In addition, ISU 20 completes the SUB1 and MUL1 
instructions and decodes and dispatches the SUB 2 and REG 
instructions to an IU 24 and LSU 26, respectively. Thus, 
as required by the in-order architecture of processor 10, 
the REG instruction, which affects the architected state 
of processor 10 is dispatched, executed and completed no 
earlier than SUB 2 , the instruction preceding it in 
program order. 

Next, in cycle 5, the MUL2 and ST instructions 
are completed by ISU 20, and the SUB 2 and REG 
instructions are executed by an IU 24 and LSU 26, 
respectively. To execute the REG instruction, LSU 26 
computes the EA of the load data and supplies the EA to 
cache hierarchy 16, which translates the EA to a real 
address and determines whether the load data associated 
with that real address is resident in LI data cache 15. 
Because of the earlier speculative execution of the PRE 
instruction, in most cases the load data is resident in 
LI data cache 15, and the REG instruction can both 
execute and load data into one of register files 32 or 34 
in the minimum data access latency permitted by cache 
hierarchy 16, which in this case is a single cycle. 
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Thereafter, in cycle 6, the ADD2 instruction, 
which is dispatched in cycle 5, is executed by one of IUs 
24 concurrent with the completion of the SUB 2 and REG 

instructions by ISU 20. As illustrated, because the PRE 
instruction speculatively prefetches the data required 
for the ADD2 instruction prior to execution of the REG 
instruction, the ADD 2 instruction, which is dependent 
upon the load data, is permitted to execute without any 
latency. Finally, ISU 20 completes the ADD2 instruction 

during cycle 7 . 

It should be evident to those skilled in the 
art that various modifications of the exemplary processor 
described herein are possible and may be desirable, 
depending upon other architectural considerations. For 
example, it may be desirable for instruction translation 
unit 18 to be merged into ISU 20. In addition, it may be 
desirable for a processor in accordance with the present 
invention to permit out-of-order execution of 
instructions other than memory access instructions (e.g., 
loads and stores) , while requiring memory access 
instructions to be executed strictly in order. In 
general, permitting non-memory-access instructions to 
execute out-of-order would not introduce any additional 
complexity as compared to in-order execution since 
conventional in-order processors include logic for 
detecting and observing register data dependencies 
between instructions. Moreover, a processor in 
accordance with the present invention may chose to 
execute the PRE instruction by speculatively loading the 
data into buffer storage, rather than merely "priming" 
the cache hierarchy with a prefetch address. Buffering 
speculatively fetched load data in this manner is 
permitted even by in-order machines in that the content 
of the register files is not affected. 
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For example, Figure 3 illustrates a load data 

queue 80 within LSU 26 that may be utilized to 

temporarily buffer load data received from cache 
hierarchy 16 in response to execution of a PREFETCH 

instruction. As shown, each entry of load data queue 80 

associates load data retrieved from cache hierarchy 16 

with the target address (TA) from which the load data was 
retrieved and the EA of the UISA load instruction, which 
is shared by and flows through processor 10 in 
conjunction with each of the PREFETCH and REGISTER USA 
instructions. Thus, when LSU 26 subsequently executes a 
REG instruction, the EA of the UISA load instruction (and 
thus the USA REG instruction) forms an index into load 
data queue 80 and the TA provides verification that the 

speculatively calculated target address was correct. 
Although implementing a load data queue such as that 
depicted in Figure 3 may reduce access latency in some 
implementations, the improvement in access latency 
entails additional complexity in that store operations 
and exclusive access requests by other processors must be 
snooped against the load data queue to ensure 
correctness . 

In another embodiment of the present invention, 
it may be desired to permit the PREFETCH instruction to 
be issued and executed as early as possible, but still 
constrain the PREFETCH instruction to be executed without 
utilizing speculative address operands. That is, when 
dispatching instructions, ISU 20 would still advance the 
PREFETCH instruction as far as possible in execution 
order with respect to the REGISTER instructions, but 
processor 10 would enforce register data dependencies so 
that PREFETCH instructions would always use correct 
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(i.e., non- speculative) register values when computing 
the prefetch address. 

As has been described, the present invention 
provides an improved processor and method of performing 
load operations that translate UISA load operations into 
separately executable prefetch and register operations. 
Because performing the prefetch operation does not affect 
the architected state of a processor, the prefetch 
operation can be performed speculatively to mask data 
access latency, even in in-order execution machines. The 
register operation can thereafter be performed in-order 
to complete the load operation. 

While the invention has been particularly shown 
and described with reference to a preferred embodiment, 
it will be understood by those skilled in the art that 
various changes in form and detail may be made therein 
without departing from the spirit and scope of the 
invention. 
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CLAIMS 

What is claimed is: 

A. A method of processing an instruction in a processor, 
said method comprising: 

fetching a sequence of instructions including 
an instruction; and 

translating the instruction into separately 
executable prefetch operation and register operations, 
wherein said prefetch operation obtains, in an out-of- 
order fashion, data need to execute said register 
operation and said register operation performs an 
operation in order. 
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1 2/. A processor, comprising: 

2 means for fetching a sequence of instructions 

3 including an instruction; and 



4 means for translating the instruction into 

5 separately executable prefetch operation and register 

6 operations, wherein said prefetch operation obtains, in 

7 an out-of-order fashion, data need to execute said 

8 register operation and said register operation performs 

9 an operation in order. 
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^ A processor, comprising: 

a plurality of registers; 

instruction processing circuitry that fetches a 
load instruction and a preceding instruction that 
precedes said load instruction in program order and, 
responsive to detecting said load instruction, translates 
said load instruction into separately executable prefetch 
and register operations; and 

execution circuitry that performs at least said 
prefetch operation out-of-order with respect to said 
preceding instruction to prefetch data and subsequently 
separately executes said register operation to place said 
data into a register among said plurality of registers 
specified by said load instruction. 
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1 4. The processor of Claim 3, wherein said execution 

2 circuitry executes said register operation in-order with 

3 respect to said preceding instruction. 

1 5. The processor of Claim 3, wherein said execution 

2 circuitry executes said register operation out-of-order 

3 with respect to said preceding instruction. 

1 6. The processor of Claim 3, wherein said prefetch 

2 operation and said register operation have a same 

3 operation code. 

. 

IM V. The processor of Claim 6, wherein said prefetch 

operation and said register operation differ only in a 

js value of a register operation field. 

oi 8. The processor of Claim 3, wherein said execution 

2f circuitry stores said data prefetched in response to said 

fg prefetch operation in a temporary register. 

1 9. The processor of Claim 3, and further comprising a 

2 data hazard detector that, in response to detection of a 

3 hazard for said data, signals said processor to discard 

4 said data and said register operation. 
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1 IfS. A method of performing a load operation in a 

2 /processor having a plurality of registers, said method 

3 comprising: 



4 fetching a load instruction and a preceding 

5 instruction that precedes said load instruction in 

6 program order; 

7 detecting said load instruction and translating 

8 said load instruction into separately executable prefetch 

9 and register operations; 



10 performing at least said prefetch operation 

11 out-of-order with respect to said preceding instruction 
fi£ to prefetch data; and 

thereafter, separately executing said register 

iff operation to place said data into a register among said 

i# plurality of registers specified by said load 

iTe instruction. 
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1 11. The method of Claim 10, and further comprising 

2 executing said register operation in-order with respect 

3 to said preceding instruction. 

1 12. The method of Claim 10, and further comprising 

2 executing said register operation out-of-order with 

3 respect to said preceding instruction. 

1 13. The method of Claim 10, wherein translating said 

2 load instruction comprises translating load operation 

3 into prefetch and register operation have a same 

4 operation code. 

Jl 14. The method of Claim 13, wherein said prefetch 

*3 operation and said register operation differ only in a 

U| value of a register operation field. 

Ml 15. The method of Claim 10, wherein performing said 

j4 prefetch operation comprises storing said data in a 

3 temporary register. 

1 16. The method of Claim 10, and further comprising: 

2 detecting a data hazard for said data; and 

3 in response to detection of said hazard for 

4 said data, discarding said data and said register 

5 operation. 
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ABSTRACT OF THE DISCLOSURE 
PROCESSOR AND METHOD OF EXECUTING A LOAD INSTRUCTION THAT 
BIFURCATE LOAD EXECUTION INTO TWO OPERATIONS 

A processor implementing an improved method for 
executing load instructions includes execution circuitry, 
a plurality of registers, and instruction processing 
circuitry. The instruction processing circuitry fetches 
a load instruction and a preceding instruction that 
precedes the load instruction in program order, and in 
response to detecting the load instruction, translates 
the load instruction into separately executable prefetch 
and register operations. The execution circuitry 
performs at least the prefetch operation out-of-order 
with respect to the preceding instruction to prefetch 
data into the processor and subsequently separately 
executes the register operation to place the data into a 
register specified by the load instruction. In an 
embodiment in which the processor is an in-order machine, 
the register operation is performed in-order with respect 
to the preceding instruction. 
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As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next 
to my name; 

I believe I am the original, first and sole inventor (if only one name is 
listed below) or an original, first and joint inventor (if plural names are 
listed below) of the subject matter which is claimed and for which a patent is 
sought on the invention entitled 

PROCESSOR AND METHOD OF EXECUTING A LOAD INSTRUCTION THAT BIFURCATE LOAD 
EXECUTION INTO TWO OPERATIONS 

the specification of which (check one) 

X is attached hereto. 

was filed on 

as Application Serial No. 

and was amended on 

(if applicable) 

I hereby state that I have reviewed and understand the contents of the above 
identified specification, including the claims, as amended by any amendment 
referred to above. 

I acknowledge the duty to disclose information which is material to the 
patentability of this application in accordance with Title 37, Code of Federal 
Regulations, §1 . 56 . 

I hereby claim foreign priority benefits under Title 35, United States Code, §119 
of any foreign application (s) for patent or inventor's certificate listed below 
and have also identified below any foreign application for patent or inventor's 
certificate having a filing date before that of the application on which 
priority is claimed: 

Prior Foreign Application (s) : Priority Claimed 

Yes No 

(Number) (Country) (Day /Month/ Year) 

I hereby claim the benefit under Title 35, United States Code, §12 0 of any United 
States application (s) listed below and, insofar as the subject matter of each of 
the claims of this application is not disclosed in the prior United States 
application in the manner provided by the first paragraph of Title 35, United 
States Code, §112, I acknowledge the duty to disclose information material to 
the patentability of this application as defined in Title 37, Code of Federal 
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Regulations, §1.56 which occurred between the filing date of the prior 
application and the national or PCT international filing date of this 
application: 



I hereby declare that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States Code and that such willful 
false statements may jeopardize the validity of the application or any patent 
issued thereon. 

POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attorneys 
and/or agents to prosecute this application and transact all business in the 
Patent and Trademark Office connected therewith. 

John W. Henderson, Jr., Reg. No. 26,907; Thomas E. Tyson, Reg. No. 28,543; Robert 
M. Carwell, Reg. No. 28,499; Jeffrey S. LaBaw, Reg. No. 31,633; Douglas H. 
Lefeve, Reg. No. 26,193; Casimer K. Salys, Reg. No. 28,900; David A. Mims , Jr., 
Reg. No. 32,708; Volel Emile, Reg. No. 3 9,969; James H. Barksdale, Jr. Reg. No. 
24,091; Anthony V. England, Reg. No. 35,129; Leslie A. Van Leeuwen, Reg. No. 
42,196; Marilyn S. Dawkins, Reg. No. 31,140; Mark E. McBurney, Reg. No. 33,114; 
Christopher A. Hughes, Reg. No. 26,914; Edward A. Pennington, Reg. No. 32,588; 
John E. Hoel, Reg. No. 26,279; Joseph C. Redmond, Jr., Reg. No. 18,753; Andrew 
J. Dillon, Reg. No. 29,634; Max Ciccarelli, Reg. No. 39,454; Daniel E. Venglarik, 
Reg. No. 39,409; Jack V. Musgrove, Reg. No. 31,986; Brian F. Russell, Reg. No. 
40,796; Steven Lin, Reg. No. 35,250; Matthew W. Baca, Reg. No. 42,277; Antony P. 
Ng, Reg. No. 43,427; John G. Graham, Reg. No. 19,563; Matthew S. Anderson, Reg. 
No. 39,093; Michael R. Barre, Reg. No. 44,023; Andrew Mitchell Harris, Reg. No. 
42,638; Richard McCain, Reg. No. 43,785; Michael Noe, Reg. No. 44,975; and 
Sidney L. Weatherford, Reg. No. P-45,602. 

Send correspondence to: Andrew J. Dillon, FELSMAN, BRADLEY, VADEN, GUNTER & 
DILLON, LLP, Suite 350 Lakewood on the Park, 760 0B North Capital of Texas 
Highway, Austin, Texas 78731, and direct all telephone calls to Andrew J. Dillon, 
(512) 343-6116. 



FULL NAME OF SOLE OR E^ST INVENTOR; Charles Robert Moore 



RES IDENCE : 8802 Royal wood Drive 
Austin, Texas 78750 

CITIZENSHIP: U.S.A. 

POST OFFICE ADDRESS: 8802 Rovalwood Drive 

Austin, Texas 78750 



(Application Serial #) 



(Filing Date) 



(Status) 



INVENTORS SIGNATURE: 
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